A novel electronic health record-based, machine-learning model to predict severe hypoglycemia leading to hospitalizations in older adults with diabetes: A territory-wide cohort and modeling study

Background Older adults with diabetes are at high risk of severe hypoglycemia (SH). Many machine-learning (ML) models predict short-term hypoglycemia are not specific for older adults and show poor precision-recall. We aimed to develop a multidimensional, electronic health record (EHR)-based ML model to predict one-year risk of SH requiring hospitalization in older adults with diabetes. Methods and findings We adopted a case-control design for a retrospective territory-wide cohort of 1,456,618 records from 364,863 unique older adults (age ≥65 years) with diabetes and at least 1 Hong Kong Hospital Authority attendance from 2013 to 2018. We used 258 predictors including demographics, admissions, diagnoses, medications, and routine laboratory tests in a one-year period to predict SH events requiring hospitalization in the following 12 months. The cohort was randomly split into training, testing, and internal validation sets in a 7:2:1 ratio. Six ML algorithms were evaluated including logistic-regression, random forest, gradient boost machine, deep neural network (DNN), XGBoost, and Rulefit. We tested our model in a temporal validation cohort in the Hong Kong Diabetes Register with predictors defined in 2018 and outcome events defined in 2019. Predictive performance was assessed using area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC) statistics, and positive predictive value (PPV). We identified 11,128 SH events requiring hospitalization during the observation periods. The XGBoost model yielded the best performance (AUROC = 0.978 [95% CI 0.972 to 0.984]; AUPRC = 0.670 [95% CI 0.652 to 0.688]; PPV = 0.721 [95% CI 0.703 to 0.739]). This was superior to an 11-variable conventional logistic-regression model comprised of age, sex, history of SH, hypertension, blood glucose, kidney function measurements, and use of oral glucose-lowering drugs (GLDs) (AUROC = 0.906; AUPRC = 0.085; PPV = 0.468). Top impactful predictors included non-use of lipid-regulating drugs, in-patient admission, urgent emergency triage, insulin use, and history of SH. External validation in the HKDR cohort yielded AUROC of 0.856 [95% CI 0.838 to 0.873]. Main limitations of this study included limited transportability of the model and lack of geographically independent validation. Conclusions Our novel-ML model demonstrated good discrimination and high precision in predicting one-year risk of SH requiring hospitalization. This may be integrated into EHR decision support systems for preemptive intervention in older adults at highest risk.


Conclusions
Our novel-ML model demonstrated good discrimination and high precision in predicting one-year risk of SH requiring hospitalization.This may be integrated into EHR decision support systems for preemptive intervention in older adults at highest risk.

Author summary
Why was this study done?
• Older adults with diabetes are at high risk of severe hypoglycemia (SH) requiring hospitalization.
• Existing machine-learning (ML) models predict short-term hypoglycemia are not specific for older adults and show poor precision-recall.
• A simple tool to identify those at risk for developing SH in T2D is needed.
What did the researchers do and find?
• We included 1,456,618 records of 364,863 unique older adults (age �65 years) with diabetes and at least 1 Hong Kong Hospital Authority attendance in 2013 to 2018.
• We used 258 predictors including demographics, admissions, diagnoses, medications, and routine laboratory tests in a one-year period to predict SH events requiring hospitalization in the following 12 months.
• The XGBoost model yielded the best performance, superior to an 11-variable conventional logistic-regression model.

What do these findings mean?
• Our novel-ML model demonstrated good discrimination and high precision in predicting one-year risk of SH requiring hospitalization.

Introduction
Severe hypoglycemia (SH), different from general hypoglycemia by the requirement of assistance from a third party, is a feared complication in the management of diabetes in older adults [1].According to the multicenter Hypoglycemia Assessment Tool (HAT) Study, 83% of people with type 1 diabetes (T1D) and 46.5% of insulin-treated people with type 2 diabetes (T2D) had ever reported hypoglycemia [2,3].Multiple risk factors contribute to increased risk of SH in older adults including long disease duration, decline in hypoglycemia awareness, renal impairment, cognitive dysfunction, and insulin use [3].In Hong Kong, people with diabetes aged �75 years had the highest rate of hospitalization due to SH compared with younger adults aged 45 to 59 years (6.0 versus 2.9 events/100-person-years) [4].Apart from prolonged hospitalization and high healthcare expenditure, SH is associated with increased risk of cardiovascular (CV) disease, falls, dementia, and all-cause mortality [5].In a recent survey, most US physicians rarely de-intensified or switched hypoglycemia-causing medications in high-risk older adults [6].International guidelines recommend screening for "geriatric syndromes" including polypharmacy as part of an extended diabetes complication assessment in older adults [7].This calls for a systematic paradigm for predicting SH risk in older adults, followed by personalized prevention and treatment strategies to avoid SH events and related comorbidities [8].There is a need for a model specifically designed for SH in older adults with diabetes, as compared to risk prediction in the general population of people with diabetes.SH risk prediction models have traditionally been developed using physiological and clinical variables, utilizing conventional statistical methods [9][10][11][12][13]. Karter and colleagues proposed a 6-variable risk stratification tool that categorized patients' 12-month risk of hypoglycemiarelated emergency department (ED) attendance or hospitalization [9].The predictors included number of episodes of hypoglycemia-related utilization, insulin use, sulfonylurea (SU) use, prior year emergency room use, kidney disease, and age (c-statistic of 0.83).Majority of these models demonstrated high performance in terms of area under the receiver operating characteristic curve (AUROC) reaching over 80% [9][10][11][12][13].However, since SH is relatively rare in people with diabetes, the high AUROC of a prediction model may be driven by the accurate distinguishment of those at extremely low risk of SH (i.e., true negatives), who were usually the majority in the training cohorts.In such unbalanced datasets, it may be more important to maximize precision-recall, or the ability to predict the rare occurrence of a positive SH event [14].A high proportion of false positives could lead to unnecessary intervention or de-intensification of treatment in low-risk individuals and inefficient resource utilization.Unfortunately, few published electronic health record (EHR)-based models for SH have evaluated precisionrecall.In a recent study, Ruan and colleagues used an EHR-based model with laboratory and clinical variables to predict short-term inpatient hypoglycemia [15].The best performing model was based on a machine-learning (ML) algorithm XGBoost which yielded both high AUROC and precision-recall.Time series records of clinical variables are necessary for developing models that forecast SH events [16].Hong Kong has a unique territory-wide EHR system that covers 90% of older adults aged 65 or above in the city [17].In this study, making use of the comprehensive, multidimensional data available in the local EHR system, we aimed to develop a novel ML-based model for predicting one-year risk of SH requiring hospitalization in older adults with diabetes.We anticipated that the proposed model could be embedded in a decision support system (DSS) to provide regular SH risk screening for older Chinese people with diabetes.

Dataset
In Hong Kong, the Hospital Authority (HA) established a Big Data Analytics Platform, namely Hospital Authority Data Collaboration Lab (HADCL), to support and facilitate territory-wide secure sharing of EHR-based dataset.HADCL provides anonymized data covering a broad range of patient information collected from all public hospitals and clinics in Hong Kong.The EHR system has provided an integrated, longitudinal, lifelong view of patient's health status and clinical outcomes, including comprehensive medication and laboratory records, hospitalization, residential area (linked to poverty index), health service utilization, comorbidity, and procedure data [17].We extracted a retrospective dataset from the HADCL, consisting of patients aged 65 years or above with any in-patient admissions or out-patient attendances from January 1, 2013 to December 31, 2018.All data used was collected for routine patient management with no additional data input required for the modeling [18].The dataset contains patient demographics, living districts, utilization of health care resources (in-patient admissions, transfer and discharge, out-patient admissions, ED attendance), disease diagnosis based on the International Classification of Diseases Ninth Revision (ICD-9) codes, medication dispensing data, and laboratory tests.Personal information was removed during the analysis procedure.Ethics approval was obtained from the Joint Chinese University of Hong Kong-New Territories East Cluster Clinical Research Ethics Committee (CREC-2021.050).

Study reporting
This study is reported as per transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) guideline (S1 Checklist).

Study population
People with diabetes were defined by those meeting any one of the following criteria [4]: (1) a diagnosis code for diabetes based on ICD-9 code of 250.xx from specialist out-patient clinics (SOPCs) and during hospitalization; (2) diagnosis code for diabetes based on the International Classification of Primary Care, Second Edition (ICPC-2): T89 or T90 at the general out-patient clinics (GOPCs); (3) HbA1c �6.5% in any 1 available measurement; (4) fasting plasma glucose (FPG) �7.0 mmol/L in any 1 available laboratory measurement; (5) prescription of any glucose-lowering drugs (GLDs); or (6) long-term prescription of insulin for at least 28 consecutive days.

Study design and outcome
The current study adopted a case-control design.The primary outcome for the cases was hospitalization due to SH, as defined by the principal hospital discharge diagnosis ICD-9 codes (250.80,250.81, 250.82, 250.83, and 250.30-250.33)[4].A detailed description of the definition was summarized in S1 Table.

Predictors
We curated multidimensional variables from the integrated EHR dataset [1,19], which included sociodemographic characteristics, living districts (which are linked to the average income of the residents as an index of social deprivation and poverty), utilization of health care resources (including admissions, clinic visits, consultations by allied healthcare professionals, emergency room visits), disease diagnosis from ICD-9 codes, medication dispensing data, and laboratory data (hematology, renal and liver function tests, glycemic and lipid indexes).These variables were selected based on published literature, prior knowledge, and data availability within EHR.We curated a total of 258 predictors for model derivation.A full list of the predictors and how they were represented in the source data systems and in the prediction models is available in S2 Table.

Prediction horizon and observational period
The prediction horizon (PH) is the time period between model forecasting and the occurrence of a predicted event [20].In this study, we adopted a PH of 12-month as a balance of SH event rate and clinical utility.To enrich the number of events available for model training, we allowed individuals to have multiple SH events during the whole investigation period.
We defined a 12-month period prior to the date of hospitalization due to SH as the observational period of the cases.For subjects free of SH events, we used the calendar year as the observational periods in controls.We excluded individuals who died in the same year of event onset for cases and during the observational period for controls.We calculated summary statistics for laboratory predictors within the observational periods (mean, median, maximum, and minimum) which are referred to as annual-basedAU : PerPLOSstyle; italicsshouldnotbeusedforemp values hereinafter.We used 5 consecutive years (2013 to 2017) of data to develop a model that predicted SH events leading to hospitalizations in the subsequent 12 months (2014 to 2018) (Fig 1).

Missing data
Considering the missingness in EHR data is not at complete random, we additionally defined dummy variables for laboratory predictors as the surrogates of the factors that led to the missingness.We discarded the annual-based values of predictors with missingness >50% and only retained the dummy indicators as the surrogates of these predictors [21].We imputed the missing values using the cohort median for the remaining features.

Model development
We applied 6 supervised ML algorithms for training the SH risk prediction models [22].These algorithms included generalized linear model (GLM), distributed random forest (DRF), gradient boosting machine (GBM), Rulefit, deep neural network (DNN), and extreme gradient boosting (XGBoost).The whole cohort was randomly split into training (70%), testing (20%), and validation sets (10%) (Fig 1).The models were developed using the training set and optimized via hyper-parameter tuning in the testing set.For benchmarking, we also applied the same 6 ML algorithms to train the models but using only 11 variables that were previously reported to predict hypoglycemia.The best 11-variable model approximates a conventional strategy of SH risk prediction (9)(10)(11)(12)(13).These risk factors included age, sex, history of SH, hypertension, blood glucose (HbA1c and FPG), urinary albumin-to-creatinine ratio, estimated-glomerular filtration rate (eGFR) derived from serum creatinine using the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation, use of metformin, SU, and insulin.We evaluated the models derived from different algorithms and hyper-parameters based on their performance in the validation set.The model development was conducted on the H2O platform (package version: 3.36.1.1)in R environment (www.r-project.org)[23].

Hyper-parameter optimization
When training the ML models, we conducted hyper-parameter tuning using either the default setting or a random grid search strategy implemented in the H2O package.In particular, we specified a set of values for the key hyper-parameters that affected learning rate of each ML algorithm (S3 Table ).By random grid search, all possible combinations of hyper-parameter values were sampled uniformly from the hyper-parameter space to train the model.We then selected the combination of hyper-parameters that optimized the area under the precisionrecall curve (AUPRC) in the testing data for the final models.

Model comparison
We evaluated the model performance using discrimination and calibration metrics.Given the limitations in sole considering AUROC as a discrimination metric, we also considered precision and recall.The former, also known as positive predictive value (PPV), is a measure of the ability of the model to correctly predict a patient as having hypoglycemia, computed by true positive/ (true positive + false positive).The latter, known as sensitivity, is a measure of the ability of the model to label as hypoglycemic all of patients who did indeed develop hypoglycemia, signified by the ratio true positives/(true positive + false negatives).The AUPRC was computed at the threshold that yielded the maximum F1 score in the validation set.The F1 score, calculated by the harmonic mean of the precision and recall, measures how well the prediction model can correctly identify all the positive cases and meanwhile avoid making mistakes by marking a negative control as positive.We considered the model with the highest AUPRC as the best.Calibration, the extent to which the predicted risk scores accurately estimate the observed values, was visually assessed by a calibration plot.We compared the observed and predicted risk of SH at 12-month in the validation set by ranking subjects into deciles of predicted risk.In addition, we generated risk probabilities for the outcome event using the best ML model in the training data, and scaled the probabilities to align with a continuous score from 1 to 100 by uniform quantile transformation.We then applied this scaling scheme to the validation set.Score cut-off that enriched 90% of events in the validation data was selected as the threshold for risk stratification.

External temporal validation
To assess the performance of the developed model [24], we performed validation in a separate temporal cohort selected from the Hong Kong Diabetes Register (HKDR) [18].The HKDR is an ongoing prospective register-based cohort of individuals with diabetes since 1995 who have undergone structured diabetes assessments at one of the HA hospitals, Prince of Wales Hospital, Hong Kong SAR.The HKDR cohort was periodically linked to the territory-wide Clinical Management System (CMS) for capturing of laboratory data, treatment, hospitalizations, and death.Characteristics of patients in the HKDR are described elsewhere [18].This validation cohort was composed of patients with diabetes aged 65 years or above in 2018 and alive by the end of 2019.The same definitions for predictors, outcome, and observational periods were used as in the previous analysis.

Variable importance
We sought to understand how the different variables contributed to the predictions by the XGBoost model (the selected best predictive model).We calculated the variable importance using tree-based algorithms by the H2O platform.The variable importance is computed from the gains of their respective loss functions during tree construction [25].Additionally, we used Shapley additive explanation (SHAP) value to understand the contribution of each predictor variable in the temporal validation cohort [26].

Sensitivity analysis
To interrogate the transferability of our ML model, we additionally performed 2 sensitivity analyses by restricting the predictors to the top 30 variables revealed by variable importance of our XGBoost model.We first re-trained the model using all the 30 predictors using the same training dataset.We then re-trained the second model using 22 predictors that were selected from the top 30 and were considered to be more accessible in routine healthcare and less region-specific.These excluded predictors were mainly outpatient specialty, triage category during ED attendance, ward care type and length of stay during inpatient admission, procedure times, and district of residence.The re-trained models were optimized in the testing cohort and then evaluated in both internal and external temporal validation cohorts.

Statistical methods
We presented descriptive statistics as means (standard deviations) or medians (interquartile ranges) to characterize individuals across different years or groups.The ANOVA and χ 2 tests were employed to compare differences across multiple groups for continuous and categorical variables, respectively.

Demographic characteristics
From January 1, 2013 to December 31, 2017, we identified 1,456,618 patient records of 364,863 individuals with diabetes aged above 65 with valid observational periods (2013 to 2017).The mean age was 74.4 ± 8.0 years and 46.6% of the patients were male.A total of 9,616 unique patients had been hospitalized due to SH during PHs, from which we identified 11,128 outcome events.The prevalence of hospitalization due to SH had declined from 1.3% in 2014 to 0.4% in 2018 (Table 1).Compared with patients without SH hospitalizations, patients who developed SH hospitalizations were older (77.9 ± 7.6 versus 74.4 ± 8.0 years), had more inpatient (3.7 versus 2.1 times/years), out-patient records (18.2 versus 15.1 times/year), and had history of SH (10.0%versus 0.7%).Meanwhile, they were more likely to be taking SU (67.4% versus 43.4%), insulin (48.2% versus 12.3%), and dipeptidyl peptidase-4 inhibitors (DPP-4is) (16.9% versus 7.8%), but were less likely to be taking lipid-regulating drugs (14.9% versus 67.7%) (S4 Table ).Distributions of baseline characteristics were similar in training, testing, and validation sets (S4 Table ).

Model performance
All the ML algorithms, including the conventional models using only 11 variables, yielded high AUROC value above 0.8 in training, testing, and validation sets (Tables 2 and S5).Among them, the model based on XGBoost algorithm had the best performance concerning false positives and false negatives in the internal validation set (AUPRC = 0.670; PPV = 0.848).The best ML model based on 11 conventional variables, however, only yielded an AUPRC of 0.280 (XGBoost algorithm; S5 Table ).We assessed the model calibration by splitting the validation set into deciles ordered by predicted probability of risk, where the XGBoost-based model demonstrated a good concordance between the observed and predicted events (S1 Fig) .We selected a scaled risk probability of 86 as the threshold for risk stratification since approximately 90% of cases were enriched in individuals with scaled scores greater than this cut-off in the validation dataset (S6 Table ).

Variable importance
Fig 2A demonstrates the top predictors out of 258 variables and their relative importance in the XGBoost model.The non-use of lipid-regulating drugs, use or historical use of insulin, number of in-patient records, triage category of "urgent" during ED attendance, and use of SU were the top 5 most important variables in the prediction model.Apart from well-established predictors such as medications, age, FPG, and history of SH, the XGBoost model also identified novel variables that could inform the risk of SH including, for example, outpatient appointment specialty, types of wards care, and the district of residence.Sensitivity analysis restricted to the top 30 predictors revealed that the predictive power of our model could be mostly retained by these top predictors (validation AUPRC = 0.632 versus 0.670 for the fullpredictor model).Additional sensitivity analysis by further exclusion of region-specific predictors resulted in a 22-variable model with a moderate drop of performance (validation AUPRC = 0.540; S5 Table ).

Discussion
SH poses a great healthcare burden for patients with diabetes, with potential life-threatening consequences particularly in older adults.In this study, we integrated comprehensive EHR and advanced ML algorithms to develop a risk prediction model for one-year SH hospitalization in older patients with diabetes.Compared with model built upon conventional predictors and algorithms, our model achieved improved AUROC and better precision-recall (AUPRC of 0.670 versus 0.097 for 11-variable generalized linear model).Based on routinely captured EHR data, this model has the potential to serve as a decision support tool that can be readily integrated into the territory-wide EHR system locally.Although many models for hypoglycemia prediction have been proposed [15,[27][28][29][30][31], the accuracy of these methods were only valid for short-term prediction in in-patient settings [22].However, early prediction, which leaves the clinicians with sufficient time to adjust or redesign personalized therapeutic strategies, is more desirable for preventing SH in older adults.In addition, these models were prone to making false alarming, leading to inappropriate treatment deintensification and potentially increasing the risk of hyperglycemia.Our model achieved a good precision-recall of 0.670 given the prevalence of SH requiring hospitalization was only around 1%.In another ML model which was developed to predict near-term hypoglycemia, PPV was 0.09 [28].
Against this background, our EHR-based ML model offers a highly efficient and low-cost approach in predicting risk of hospitalization due to SH in 12 months in older adults with Each feature corresponds to a continuous variable or a certain category of a categorical variable.One dot per subject per feature is colored according to the attribution value of the feature, where red represents a higher value (or "1" for a binary feature) and blue represents a lower value (or "0" for a binary feature).The features are ordered in a descending contribution to the XGBoost model.For example, non-use of lipid-regulating drugs (red color) is associated with the highest discriminative value for increased risk of SH (SHAP contribution >0), meanwhile, non-use of sulfonylureas (red color) associated with the discriminative value for reduced risk of SH (SHAP contribution <0).https://doi.org/10.1371/journal.pmed.1004369.g002diabetes.Our model utilized annual-based summary statistics to reduce the variance and increase reliability of predictors.Our model relied on EHR data which can be updated in a real-time manner as the value of any included predictor changes.Our model was proposed in line with the aim of precision medicine, where more intensive monitoring and interventions for reducing risk of SH are focused on the minority of older patients in the high-risk category.In the majority of patients, the usual strategy to optimize glucose control can be adopted accompanied by education to increase the awareness of hypoglycemia.Given the close associations of age with many risk factors for SH, a model developed in an older age group will improve the precision in identifying the very high-risk subjects for corrective action without compromising the glycemic control in low-risk elderly patients.
In our ML model, we included more than 250 variables that were potentially predictive of SH hospitalizations.Our model also considered demographic variables like default for appointment specialty clinics and district of residence of higher index of deprivation as top predictors [32].These associations reiterated the close inter-relationship among multiple morbidities including SH, fragility, and low socioeconomic status, which had not been highlighted by previous models [33,34].Apart from confirming known clinical risk factors, such as use of insulin and SU, and history of SH events [3], our model has also revealed novel factors associated with SH.For example, non-use of lipid-regulating drug was identified as the most discriminative predictor of higher risk of SH in both the development and replication datasets.Although our analyses cannot be used to infer causation, the associations are plausible as statins are known to increase insulin resistance and worsen glucose tolerance [35].Alternatively, non-use of lipid-lowering drug may be a marker of frailty or other shared risk factors for SH.
Our work has several strengths.This is the first risk prediction model for SH leading to hospitalizations in older adults with diabetes.We included over 1 million subjects for model training and validation, using over 250 multidimensional variables from a territory-wide EHR to build the model.We used annual-based summary statistics of variables to increase the stability of our model, making it less prone to errors due to outliers, sporadic data, or noisy laboratory test values which are common features in EHR data.We also benchmarked multiple supervised ML algorithms to obtain the optimized model.In addition to AUROC, we also presented AUPRC model that was often omitted by previous studies due to the rareness of SH events in previous database.Our advanced ML algorithms considered both nonlinear associations and interactions among predictors to identify both conventional and novel risk factors with better performance than conventional methods.The complexity of the model also takes into account the missing values of predictors, making it a useful decision support tool in a healthcare system.Finally, we validated our model using a temporal cohort that confirmed the robustness of our model for future prospective validation and implementation.
Our study also has limitations.First, we demonstrated temporal but not geographical transportability of our model.We utilized territory-wide dataset in Hong Kong across all public hospitals that are linked in our training dataset.The transportability of our model to other regions, countries, ethnicities, and healthcare systems is unknown given population characteristics are likely to be different.Similarly, the threshold we currently selected for risk stratification required recalibration when applying to other cohorts.However, as many of our top predictors and variables such as hospital attendance, drug use, and history of SH are commonly available in most EHR systems, we expect our work can inspire similar studies where our model can be adapted and calibrated to other settings.This was also supported by our sensitivity analyses where the model performance was still comparable when restricted to top 30 predictors.Second, our EHR system did not capture lifestyle-related variables (e.g., diet, exercise) or self-monitoring of blood glucose.Meanwhile, our data did not include anthropometric parameters either, which are important for dose-related calculations for medication exposure and can also reflect nutritional or health states.Finally, we used principal hospital discharge diagnostic codes to define SH events in this study, which may underestimate the number of SH events requiring third-party assistance but did not require hospital admission.Our model also does not predict non-SH which is mostly self-reported and not captured within EHRs.However, Karter and colleagues demonstrated their tool predicting 12-month hypoglycemiarelated ED or hospital use showed high agreement with self-reported SH [9].Further they demonstrated their tool also predicted continuous glucose monitoring (CGM) detected hypoglycemia (time <50 mg/dL) with high accuracy [36].We plan to evaluate our ML model for non-severe and CGM-detected hypoglycemia in prospective studies.
In summary, we have developed a one-year EHR-based ML-risk prediction model for SH leading to hospitalizations in older adults with diabetes using multidimensional EHR data.The model outperforms conventional models in AUROC and precision-recall with reduced number of false positives which might lead to unnecessary interventions, with both implications for the patients and healthcare system.Given the increasing use of EHR, our ML model can be developed into a decision-making tool to alert physicians to implement early preventive actions, such as de-prescribing or treatment reconciliation.There is also growing evidence for use of technologies in older adults, such as CGM with hypoglycemic alerts [37,38].We anticipate that the proposed model could be embedded in a DSS to provide regular SH risk screening for older Chinese people with diabetes [37].Such program can be particularly effective if combined with a regular comprehensive diabetes assessment program that allows periodic review of clinical state for quality assurance [18].Implementation studies are needed to define the logistics of ML-based hypoglycemia risk stratification tool with patient-centered decision support and evaluate its impacts on clinician and patient behavior, change of medications as well as clinical outcomes and cost-effectiveness.

Fig 2 .
Fig 2. Scaled relative importance of top 30 predictors from the XGBoost model in the validation set (A) and contribution of the top 20 features of the XGBoost model in the temporal validation set (B). (A) Variable importance plot that shows the relative importance of top 30 predictors from the XGBoost model in the validation set.A&E, Accident and Emergency attendance; FPG, fasting plasma glucose; 1 yr minimum/mean/maximum, the minimum/mean/maximum of all the values of the corresponding laboratory test in the recent 12 months; LDL-C, low-density lipoprotein cholesterol."Insulin" included both insulin use and ever use.(B) SHAP (SHapley Additive exPlanations) summary plot that shows the contribution of the top 20 features of the XGBoost model in the temporal validation set.Each feature corresponds to a continuous variable or a certain category of a categorical variable.One dot per subject per feature is colored according to the attribution value of the feature, where red represents a higher value (or "1" for a binary feature) and blue represents a lower value (or "0" for a binary feature).The features are ordered in a descending contribution to the XGBoost model.For example, non-use of lipid-regulating drugs (red color) is associated with the highest discriminative value for increased risk of SH (SHAP contribution >0), meanwhile, non-use of sulfonylureas (red color) associated with the discriminative value for reduced risk of SH (SHAP contribution <0).

Table 1 . Characteristics of SH requiring hospitalization among old adults with diabetes in 2013-2017.
To evaluate the robustness of the XGBoost-based prediction model against the training data collection period (2013 to 2018), we applied the final model to a temporal validation cohort from the HKDR.The HKDR cohort included predictors collected in 2018 and the occurrence of SH hospitalizations in 2019 and included 14,295 valid patients records in 13,917 patients aged 65 or above in 2018, diagnosed with diabetes, and alive by the end of 2019 (Table1).Using the same outcome definition, we identified 722 SH hospitalizations.The XGBoostbased prediction model yielded an AUROC of 0.856 and an AUPRC of 0.286 in this separate cohort.Top features revealed by the SHAP value showed high consistency, where non-use of lipidregulating drugs in the recent 12 months had the largest discriminative power to indicate risk of hospitalizations due to SH. Fewer in-patient records, use of SU and insulin, more outpatient records, more urgent or semi-urgent triage at ED attendance, and lower annual-minimum of FPG were associated with increased risk of SH in subsequent 12 months in this separate validation cohort (Fig2B).